Diagnostic Accuracy of Fecal Calprotectin for Predicting Relapse in Inflammatory Bowel Disease: A Meta-Analysis

Fecal calprotectin (FC) levels correlate with the disease activity of inflammatory bowel diseases (IBD); however, the utility of FC in predicting IBD relapse remains to be determined. We aim to evaluate the efficacy of fecal calprotectin in predicting the relapse of inflammatory bowel disease. We searched Pubmed (MEDLINE), Embase, Web of Science, and the Cochrane library databases up to 7 July 2021. Our study estimated the pooled sensitivity and specificity, summary receiver operating characteristic (SROC) curve, and the optimal cut-off value for predicting IBD relapse using a multiple threshold model. A total of 24 prospective studies were included in the meta-analysis. The optimal FC cut-off value was 152 μg/g. The pooled sensitivity and specificity of FC was 0.720 (0.528 to 0.856) and 0.740 (0.618 to 0.834), respectively. FC is a useful, non-invasive, and inexpensive biomarker for the early prediction of IBD relapse. An FC value of 152 μg/g is an ideal threshold to identify patients with a high relapse probability.


Introduction
Inflammatory bowel diseases (IBD) are chronic gastrointestinal disorders with a remitting and relapsing course and are associated with multiple complications. IBD incidence has increased in industrialized countries with increased healthcare expenditure and poor quality of life [1]. Ulcerative colitis (UC) and Crohn's disease (CD) represent the two main types of IBD. Since the clinical course of IBD remains unpredictable, there is an urgent need to develop serum and fecal biomarkers to help predict relapse to take appropriate measures to reduce complications [2,3].
Endoscopy plays an essential role in the diagnosis, management, prognosis, and surveillance of IBD [4,5]. However, in routine practice, endoscopic evaluations of disease severity are relatively expensive and invasive. In addition, endoscopic monitoring is the least acceptable for of monitoring from the patients' perspectives [6]. Accurate tests that are practical, non-invasive, and inexpensive would be ideal. Several promising serologic and fecal biomarkers have emerged that could fulfill this role, including fecal calprotectin (FC), C-reactive protein (CRP), and erythrocyte sedimentation rate (ESR) [7]. CRP and ESR are useful to confirm ongoing mucosal inflammation but are of less value to predict a future relapse since elevated levels of these markers have not been found to precede a clinical flare [7]. Furthermore, there is considerable heterogeneity in CRP generation based on the genetics of individual patients [8]. These limitations have encouraged the development of alternative tests, specifically stool biomarkers with higher specificity for intestinal inflammation.
FC is an excellent marker of intestinal inflammation. Calprotectin is a calcium and zinc-binding protein formed by a heteromeric complex of two subunits, S100A8 and

Data Synthesis and Analysis
To obtain the summary receiver operating characteristic (SROC) curve and an optimal cut-off for predicting IBD relapse, we applied the multiple thresholds model, which included multiple cut-off values with the results of true positive, true negative, false positive, and false negative. The multiple threshold model is a new approach for the meta-analysis of diagnostic test accuracy studies where several studies reported more than one threshold and the corresponding sensitivity and specificity values. The approach is based on the idea of estimating the distribution functions of the biomarker with the nondiseased and diseased individuals using a common parametric assumption (normal or logistic) for the distribution of a continuous biomarker. This was achieved using a mixed effects model with the study as a random factor [16]. The optimal cut-off was defined as the point where the Youden index (sensitivity + specificity − 1) was maximized [17]. We used the inverse variance weight to measure the mean value in order to represent the weight of individual studies. The model that minimized the restricted maximum likelihood criterion was chosen as the best. In addition, we used random effects bivariate models to calculate pooled sensitivity and specificity, and the same is true for subgroup analysis. We also created forest plots for each study.
To explore the clinical utility of FC for the prediction of the relapse of IBD, we performed a Fagan nomogram. The relationship between the prior probability, the likelihood ratio, and the posterior test probability is portrayed graphically by comparing 25, 50, and 75% prior probabilities [18]. The likelihood ratios obtained represented three clinical application scenarios: 1 low suspicion of relapse for IBD: 25%; 2 high suspicion of relapse for IBD: 75%; and 3 worst-case scenario: 50%.
Additionally, we calculated the positive predictive values (PPVs) and negative predictive values (NPVs) related to different cut-off values under varying levels of relapse rate by using a linear mixed-effects model for multiple thresholds model [19].
Although a funnel plot is the basic graphical method to detect publication bias, it is not recommended to be used in the diagnostic meta-analysis because of the multiple thresholds, so we did not explore publication bias [20].
As the thresholds can vary for each study, it was essential to see how close the observed results are to the receiver operating characteristic (ROC) curve rather than how dispersed they are in the ROC space [21]. The magnitude of heterogeneity is best accessed by a graph, which can be observed by the dispersion of points and the closeness between the 95% prediction region and 95% confidence region in the SROC curve [22]. We performed subgroup analysis related to the type of diseases, follow-up time, reference standard, and FC assay.

Selection, Characteristics, and Quality of Studies
Our initial search yielded 3209 papers. Additionally, we added 30 papers from the review of relevant literature references. After removing duplicates and screening titles and abstracts, 396 studies were selected for full-text review. Of these, 356 were initially excluded. After data extraction and discussion, another 16 studies were excluded. The reasons for the exclusion of each study were listed in the Table 1. Finally, 24 studies were included with a total of 2260 patients of whom 715 relapsed ( Figure 1).   Table 2 summarizes the characteristics of the included studies. All studies used a prospective study design and enrolled patients with quiescent IBD at baseline. In included studies, 7/24 (29.2%) of them [40][41][42][43][44][45][46] solely involved patients with CD, while 7/24 (29.2%) of them [47][48][49][50][51][52][53] involved only patients with UC. The remaining 10/24 (41.7%) studies [7,[54][55][56][57][58][59][60][61][62] included patients with both UC and CD. FC was measured at baseline. The IBD relapse was identified with clinical symptoms and/or endoscopic findings on follow-up over a period of time. The follow-up period varied between studies, as shown in Table 2. The definitions of relapse in each study were listed in the Table 3. Since the definition of recurrence varies from study to study, for the sake of analysis, we divided them into two broad categories: clinical relapse and endoscopic relapse. A total of 5/24 (20.8%) studies used endoscopy as a reference; 19/24 (79.2%) studies used clinical symptoms or therapy change. Cut-off values for predicting relapse ranged from 50 to 500 μg/g, and most of them were mainly in the range of 100-250 μg/g.  Table 2 summarizes the characteristics of the included studies. All studies used a prospective study design and enrolled patients with quiescent IBD at baseline. In included studies, 7/24 (29.2%) of them [40][41][42][43][44][45][46] solely involved patients with CD, while 7/24 (29.2%) of them [47][48][49][50][51][52][53] involved only patients with UC. The remaining 10/24 (41.7%) studies [7,[54][55][56][57][58][59][60][61][62] included patients with both UC and CD. FC was measured at baseline. The IBD relapse was identified with clinical symptoms and/or endoscopic findings on follow-up over a period of time. The follow-up period varied between studies, as shown in Table 2. The definitions of relapse in each study were listed in the Table 3. Since the definition of recurrence varies from study to study, for the sake of analysis, we divided them into two broad categories: clinical relapse and endoscopic relapse. A total of 5/24 (20.8%) studies used endoscopy as a reference; 19/24 (79.2%) studies used clinical symptoms or therapy change. Cut-off values for predicting relapse ranged from 50 to 500 µg/g, and most of them were mainly in the range of 100-250 µg/g.   UC (1) Significant increase in respective clinical activity indices above accepted cut-offs for remission in UC (Simple Colitis Activity Index ≥ 3) and/or (2) step up in the patient's therapeutic regimen, including surgery for intractable disease-related symptoms.

CD
(1) Significant increase in respective clinical activity indices above accepted cut-offs for remission in CD (HBI ≥ 5) and/or (2) step up in the patient's therapeutic regimen, including surgery for intractable disease-related symptoms.
L. Ye 2017 [42] CD Worsening symptoms requiring intensified therapy or surgery or a CDAI score > 150, with confirmation by ileocolonoscopy. Overall, the quality of the included studies was good (see the results of QUADAS-2 in Supplementary Table S4). Eleven studies [42,44,45,47,48,[51][52][53]56,58,62] did not mention whether the patients enrolled were consecutive or not. Blinding of reference standard results was reported in all but one study [61]. Four studies [46,47,50,51,61] reported the blinding of index test results, while others did not mention it.

3.2.
Performance of FC at the Optimal Cut-Off Value 3.2.1. Primary Outcome Figure 2 presents the forest plots of sensitivity (true positive rate) and 1 − specificity (false positive rate) for the 24 studies. Combining all available data from the 24 studies using the multiple thresholds model, the resulting SROC curve is shown in Figure 3. An optimal cut-off value of 152 µg/g was identified. At 152 µg/g, the Youden index reached its maximum (Supplementary Figure S1). Its corresponding sensitivities and specificities were 0.720 (0.528 to 0.856) and 0.740 (0.618 to 0.834), respectively. The area under the SROC curve (AUC) for predicting IBD relapse was found to be 0.794.
Overall, the quality of the included studies was good (see the results of QUADAS-2 in Supplementary Table S4). Eleven studies [42,44,45,47,48,[51][52][53]56,58,62] did not mention whether the patients enrolled were consecutive or not. Blinding of reference standard results was reported in all but one study [61]. Four studies [46,47,50,51,61] reported the blinding of index test results, while others did not mention it. Figure 2 presents the forest plots of sensitivity (true positive rate) and 1 − specificity (false positive rate) for the 24 studies. Combining all available data from the 24 studies using the multiple thresholds model, the resulting SROC curve is shown in Figure 3. An optimal cut-off value of 152 μg/g was identified. At 152 μg/g, the Youden index reached its maximum (Supplementary Figure S1). Its corresponding sensitivities and specificities were 0.720 (0.528 to 0.856) and 0.740 (0.618 to 0.834), respectively. The area under the SROC curve (AUC) for predicting IBD relapse was found to be 0.794.   Furthermore, the bivariate model was also applied to evaluate the diagnostic performance of FC by using the data from just one cut-off reported for each study. Based on the multiple threshold model results, if a study reported multiple cut-off values, then the option closest to 152 μg/g was selected. The cut-off values ranged from 50 to 340 μg/g. Its corresponding sensitivities, specificities, and AUC were 0.80 (0.73 to 0.85), 0.78 (0.73 to 0.82), and 0.85 (0.82 to 0.88), respectively. The SROC for the bivariate model can be found in Supplementary Figure S2.

Post-Test Probability of Relapse
In clinical practice, there is a need to understand the probability that a patient with quiescent IBD will relapse or not when an FC test result exceeds a certain threshold. The PPV and NPV varied for various relapse rates of IBD because these are related to the disease prevalence. Therefore, it was addressed with a multiple thresholds model, with a calculation of PPVs and NPVs related to the optimal and other common cut-off values for different levels of relapse rate (Supplementary Table S5). Employing an FC threshold of 152 μg/g, the highest NPV of 0.98 was observed when using the test in a low-relapse rate setting, i.e., when the relapse rate was no more than 5%. The highest PPV of 0.893 was observed in a high-relapse rate setting (72.5%). To further improve the analysis of the predictive effect of FC on relapse with the threshold of 152 μg/g, we additionally calculated the post-test probability with three different levels of relapse rate. The Fagan nomogram showed that FC testing changed the post-test probability of IBD (Figure 4). In the low suspicion of IBD relapse, the results showed that a negative post-test probability of 8% could be considered sufficient to exclude the high possibility of relapse. On the other hand, in the high suspicion of IBD relapse, a positive post-test probability of 92% could be considered sufficient to warn of the relapse within 24 months.

Subgroups Analysis
In order to determine whether disease types (CD or UC), follow-up time (<1 year or ≥1 year), reference standard (clinic or endoscopy), and FC-assay (BÜ HLMANN fCAL ® ELISA, Calprest ® or Human Calprotectin ELISA Kit, Cell Sciences Inc., Newburyport, MA, USA) were sources of heterogeneity, we performed subgroup analyses. Analyses showed Furthermore, the bivariate model was also applied to evaluate the diagnostic performance of FC by using the data from just one cut-off reported for each study. Based on the multiple threshold model results, if a study reported multiple cut-off values, then the option closest to 152 µg/g was selected. The cut-off values ranged from 50 to 340 µg/g.

Post-Test Probability of Relapse
In clinical practice, there is a need to understand the probability that a patient with quiescent IBD will relapse or not when an FC test result exceeds a certain threshold. The PPV and NPV varied for various relapse rates of IBD because these are related to the disease prevalence. Therefore, it was addressed with a multiple thresholds model, with a calculation of PPVs and NPVs related to the optimal and other common cut-off values for different levels of relapse rate (Supplementary Table S5). Employing an FC threshold of 152 µg/g, the highest NPV of 0.98 was observed when using the test in a low-relapse rate setting, i.e., when the relapse rate was no more than 5%. The highest PPV of 0.893 was observed in a high-relapse rate setting (72.5%). To further improve the analysis of the predictive effect of FC on relapse with the threshold of 152 µg/g, we additionally calculated the post-test probability with three different levels of relapse rate. The Fagan nomogram showed that FC testing changed the post-test probability of IBD (Figure 4). In the low suspicion of IBD relapse, the results showed that a negative post-test probability of 8% could be considered sufficient to exclude the high possibility of relapse. On the other hand, in the high suspicion of IBD relapse, a positive post-test probability of 92% could be considered sufficient to warn of the relapse within 24 months.

Discussion
Our meta-analysis aimed to obtain an ideal cut-off value for predicting IBD relapse suitable for clinical use. Although we intended to stratify patients with UC and CD and tried to calculate a threshold unique for CD and UC using the multiple threshold model, there were not enough studies. However, by performing further subgroup analysis, we found that the sensitivity and specificity results did not change when the disease type was stratified into UC vs. CD using the random-effects bivariate models. Thus, we calculated the cut-off value for the IBD group (including patients with CD and UC). It was found that the pooled sensitivity of FC is 0.720 (95% CI 0.528-0.856) and the pooled specificity is 0.740 (0.618-0.834) with an AUC of 0.794 at the cut-off value of 152 μg/g. The estimated DOR is 14, indicating that FC is a useful biomarker in predicting the relapse of IBD.

Implications of Key Findings
Heida A et al. [63] suggested that FC levels in remission should be used to predict recurrence trends in patients with IBD. FC is inexpensive and non-invasive and has better specificity than CRP. FC is remarkably stable in stools for up 7 days at room temperature, enabling sample collection at home even in patients' remote locations. These characteristics of FC may make monitoring IBD patients convenient and practical. Based on our results, we suggest that 152 μg/g was an appropriate threshold for monitoring IBD. Patients with higher FC levels (>152 μg/g) should be warned of the possibility of relapse within 24 months.
We used the comprehensive GRADE approach [64] to examine the validity of our results (Table 4). Recognition of the accuracy of FC as a substitute for outcomes important to patients is central to this approach. Detection of FC levels to predict relapse of patients with IBD will be valuable only if the FC monitoring improves the care of patients with IBD. Therefore, we inferred from the pooled sensitivity and specificity for the effect of the FC test on patient monitoring for IBD relapse. The key question is whether the numbers of false negatives (cases that the risk of recurrence was underestimated) and false positives (cases that the risk of recurrence was overestimated) are acceptable in this context.

Subgroups Analysis
In order to determine whether disease types (CD or UC), follow-up time (<1 year or ≥1 year), reference standard (clinic or endoscopy), and FC-assay (BÜHLMANN fCAL ® ELISA, Calprest ® or Human Calprotectin ELISA Kit, Cell Sciences Inc., Newburyport, MA, USA) were sources of heterogeneity, we performed subgroup analyses. Analyses showed similar summary performance for all subgroups (Supplementary Table S6).

Discussion
Our meta-analysis aimed to obtain an ideal cut-off value for predicting IBD relapse suitable for clinical use. Although we intended to stratify patients with UC and CD and tried to calculate a threshold unique for CD and UC using the multiple threshold model, there were not enough studies. However, by performing further subgroup analysis, we found that the sensitivity and specificity results did not change when the disease type was stratified into UC vs. CD using the random-effects bivariate models. Thus, we calculated the cut-off value for the IBD group (including patients with CD and UC). It was found that the pooled sensitivity of FC is 0.720 (95% CI 0.528-0.856) and the pooled specificity is 0.740 (0.618-0.834) with an AUC of 0.794 at the cut-off value of 152 µg/g. The estimated DOR is 14, indicating that FC is a useful biomarker in predicting the relapse of IBD.

Implications of Key Findings
Heida A et al. [63] suggested that FC levels in remission should be used to predict recurrence trends in patients with IBD. FC is inexpensive and non-invasive and has better specificity than CRP. FC is remarkably stable in stools for up 7 days at room temperature, enabling sample collection at home even in patients' remote locations. These characteristics of FC may make monitoring IBD patients convenient and practical. Based on our results, we suggest that 152 µg/g was an appropriate threshold for monitoring IBD. Patients with higher FC levels (>152 µg/g) should be warned of the possibility of relapse within 24 months.
We used the comprehensive GRADE approach [64] to examine the validity of our results (Table 4). Recognition of the accuracy of FC as a substitute for outcomes important to patients is central to this approach. Detection of FC levels to predict relapse of patients with IBD will be valuable only if the FC monitoring improves the care of patients with IBD. Therefore, we inferred from the pooled sensitivity and specificity for the effect of the FC test on patient monitoring for IBD relapse. The key question is whether the numbers of false negatives (cases that the risk of recurrence was underestimated) and false positives (cases that the risk of recurrence was overestimated) are acceptable in this context. Detriment from undertake unnecessary psychological burden and financial expenditure. CI: Confidence interval. * GRADE recommends classifying patient-important outcomes on a 9-point scale: 7-9: critical for decision making; 4-6: essential but not critical for decision making; and 1-3: of lower importance to patients.
In a hypothetical population of 100 IBD adults in remission (given an overall recurrence rate of 25%), eighteen patients should increase the frequency of FC testing. Additionally, they need to be monitored continuously and, if necessary, endoscopically examined to confirm recurrence. Fifty-six percent of patients have a low risk of recurrence in the future and only need to be observed according to the original plan. FC testing reduces the psychological burden of those patients. Seven patients will be missed. A false negative FC result can delay determining the patient's risk of recurrence and delay treatment. Nineteen percent will be diagnosed as false positives, which results in inconvenience and unnecessary financial expense. There will also be a certain amount of stress on the mental side. We also hope that when determining whether the FC results are reliable, the factors that will produce false positives and false negatives should be excluded (Table 5). Based on the comprehensive analysis, FC is still recommended to predict recurrence in patients with IBD.

Comparison with Other Reviews
Previous meta-analyses have evaluated the performance of FC for relapse in patients with IBD [64][65][66][67]. However, an ideal cut-off value for FC was never determined. This is the first meta-analysis evaluating the FC level and obtaining an excellent cut-off value to distinguish whether patients would relapse in the near future, which is more helpful for clinical practice. YS Tham et al. [67] suggested that an FC cut-off value of 150 µg/g is associated with optimal diagnostic accuracy for postoperative endoscopic recurrence in CD. However, the performance for an FC level of 135 µg/g was not examined. Although it was the optimal cut-off value for the largest cohort they included, the value appeared in only one cohort and was not sufficient to obtain a pooled performance. Li et al. [64] showed that in patients with UC, the accuracy of FC was better in studies with a cut-off of ≥150 µg/g but did not discuss the optimal cut-off value. However, we used a novel multiple threshold model to obtain the ideal cut-off value by maximizing the Youden index rather than comparing it with other cut-off values. Also, this is the first cut-off value that takes IBD as a whole, and subgroup analysis proves that the results of UC and CD are similar. Therefore, this cut-off value is more convenient to be used in clinical practice.
Additionally, the diagnostic performance results obtained by the multiple threshold model would be a little lower but more realistic than by traditional approaches. This is because the multiple threshold model uses all the available information and results in an estimation of the performance of the biomarker, which avoids the drawback of using a single cut-off value. Previous meta-analyses only used one pair of sensitivity and specificity per study, which may lead to an overestimation of the SROC curve because there would be cut-off selection bias, and the 'optimal' point would be generally chosen [68].

Limitations of Study
We would also like to report some of the limitations of this study. The reference standard for the diagnosis of IBD relapse is still controversial; however, the reference standards used in the included studies are currently recommended. Also, QUADAS-2 is not a quality assessment method for prognostic tests, but it is still the most suitable method. We have deleted some items that are not applicable, which may lead to some deviation in the results. Additionally, due to the lack of data, we could not perform a subgroup analysis of medications used in these patients. The threshold may vary slightly when faced with different patients, and sometimes clinicians need to consider the extent of involvement and past history of the patient on the basis of the given threshold. In addition, a recent study reported that FC may increase with age, even 3-4 times [13], which reminded us it is possible that cut-off values vary in different age groups. Despite these limits, our analysis is rigorous and will further increase interest in performing high-quality studies using FC in predicting relapse in patients with IBD.

Conclusions
In conclusion, regularly measuring FC levels in IBD remission is a useful tool for the early prediction of relapse. The FC value of 152 µg/g is an ideal threshold for identifying patients with a high probability of relapse, suggesting careful follow-up and adjusting medications. Moreover, this noninvasive monitoring method will be better received by the patients without any preparation for colonoscopies and with high sensitivity and specificity. Further prospective high-quality trials are needed to determine the optimal FC measurement interval and cut-off value for the FC trend. In addition, it will also be useful to study further the predictive performance of combining markers, such as CRP and FC, for the relapse of patients with IBD.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/jcm12031206/s1, Figure S1: Youden Index of FC for predicting IBD relapse; Figure S2: Receiver operating characteristic graph of fecal calprotectin test in fecal calprotectin at remission for predicting relapse in inflammatory bowel disease, with 95% confidence region and 95% prediction regions; Table S1: The PRISMA checklist; Table S2: The search strategies of four databases; Table S3: Specific criteria of QUADAS-2; Table S4: Quality assessment of included studies; Table S5: Calculated sensitivities and specificities at cut-offs of 160, 50, 150 in predicting relapse and their corresponding PPVs and NPVs for different prevalences using the multiple thresholds model; and Table S6: Assessment of diagnostic accuracy in subgroup analysis.